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Abstract — A novel lossless source coding paradigm applies 
to problems of unreliable lossless channels with low bit rates, 
in which a vital message needs to be transmitted prior to 
termination of communications. This paradigm can be applied 
to Alfred Renyi's secondhand account of an ancient siege in 
which a spy was sent to scout the enemy but was captured. 
After escaping, the spy returned to his base in no condition to 
speak and unable to write. His commander asked him questions 
that he could answer by nodding or shaking his head, and the 
fortress was defended with this information. Renyi told this story 
with reference to prefix coding, but maximizing probability of 
survival in the siege scenario is distinct from yet related to 
the traditional source coding objective of minimizing expected 
codeword length. Rather than finding a code minimizing ex- 
pected codeword length Y17=i P(*)K*)> the siege problem involves 
maximizing Y17=i for a known 6 G (0, 1). When there 

are no restrictions on codewords, this problem can be solved 
using a known generalization of Huffman coding. The optimal 
solution has coding bounds which are functions of Renyi entropy; 
in addition to known bounds, new bounds are derived here. 
The alphabetically constrained version of this problem has 
applications in search trees and diagnostic testing. A novel 
dynamic programming algorithm — based upon the oldest known 
algorithm for the traditional alphabetic problem — optimizes 
this problem in O(n^) time and O(n^) space, whereas two novel 
approximation algorithms can find a suboptimal solution faster: 
one in linear time, the other in 0(n log ri). Coding bounds for 
the alphabetic version of this problem are also presented. 

I. Introduction 

Alfred Renyi related an ancient scenario in which the 
Romans held rebels under siege, rebels whose only hope 
was the knowledge gathered by a mute, illiterate spy, one 
who could only nod and shake his head [1, pp. 13-14]. This 
apocryphal tale — based upon a historical siege — is the 
premise behind the Hungarian version of the spoken parlor 
game Twenty Questions. A modern parallel in the 21st century 
occurred when Russian forces gained the knowledge needed 
to defeat hostage-takers by asking hostages "yes" or "no" 
questions over mobile phones [2], [3]. 

Renyi presented this problem in narrative form in order 
to motivate the relation between Shannon entropy and binary 
source coding. Note however that Twenty Questions, source 
coding, and the siege scenario actually have three different 
objectives. In Twenty Questions, the goal is to be able to 
determine an item (or message) by asking at most twenty 
questions. In source coding, the goal is to minimize the 
expected number of questions — or, equivalently, bits — 
necessary to determine the message. For the siege scenario. 



the goal is survival, that is, assuming partial information is not 
useful, the besieged would wish to maximize the probability 
that the message is successfully transmitted within a certain 
window of opportunity. When this window closes and the 
siege ends, the information becomes worthless. An analogous 
situation occurs when a wireless device is temporarily within 
range of a base station; one can safely assume that the channel, 
when available, will transmit at the lowest (constant) bitrate, 
and will be lost at a nondeterministic time after its availability. 

We consider this modified source coding problem and 
derive properties of and algorithms for the optimization of 
the problem and variants thereof. In Section HI] we formalize 
the problem and find its solution in a generalization of the 
Huffman coding algorithm previously used for a complemen- 
tary problem. Section |III] concerns several extensions and 
variants of the problem. In particular, restricting the solution 
space to alphabetic codes is considered in Section HVI with a 
dynamic programming algorithm presented for optimizing the 
alphabetic code, one that extends to the related problem of 
search trees. In Section |V] we consider entropy bounds in the 
form of Renyi entropy for the unrestricted problem, leading 
to a new bound and a related property involving the length of 
the shortest codeword of an optimal code. Entropy bounds for 
the alphabetic problem, along with linear-time approximation 
algorithms, are derived in Section IVTl Section IVTIl concludes 
with related work and a possible future direction. 

II. Formalizing the Problem 

A message is represented by symbol X drawn from the 
alphabet X = {1,2, ... ,n}. Symbol i has probability p{i), 
defining probability mass function p, known to both sender and 
receiver. The source symbols are coded into binary codewords, 
each bit of which is equivalent to an answer to a previously 
agreed-upon "yes" or "no" question; the meaning of each 
question (bit context) is implied by the previous answers 
(bits), if any, in the current codeword. Each codeword c(i), 
corresponding to symbol i, has length defining overall 
length vector I and overall code C. 

Let Cn be the set of allowable codeword length vectors, 
those that satisfy the Kraft inequality, that is, 

Cn = l^leZl such that ^2~'('' < l| . 

Furthermore, assume that the duration of the window of 
opportunity is independent of the communicated message 



and is memoryless. Memorylessness implies that the window 
duration is distributed exponentially. Therefore, quantizing 
time in terms of the number of bits T that we can send within 
our window, 



P{T = t) = (1 - I 



i = 0,l,2, 



with known parameter 9 < 1. We then wish to maximize the 
probability of success, i.e., the probability that the message 
length does not exceed the quantized window length: 

oo 

P[l{X)<T] = '^P{T = t) ■ P[l{X) <t] 



l{i)<t 



i=l 



n 



i=l t=0 



i=l 



where l((,;)<t is 1 if < t, otherwise. The problem is thus 
the following optimization: 



m'AxP\l(X) < T] 



max 



(1) 



To maximize this probability of success, we use a general- 
ization of Huffman coding developed independently by Hu et 
al. [4, p. 254], Parker [5, p. 485], and Humblet [6, p. 25], [7, 
p. 231]. The bottom-up algorithm of Huffman coding starts 
out with n weights of the form w{i) — p{i) and combines the 
two least probable symbols x and y into a two-node subtree; 
for algorithmic reduction, this subtree of combined weights is 
subsequently considered as one symbol with weight (combined 
probability) w{x)+w{y). (We use the term "weights" because 
one can turn a problem of rational probabilities into one of 
integer weights for implementation.) Reducing the problem to 
one with one fewer item, the process continues recursively 
until all items are combined into a single code tree. The 
generalization of Huffman coding used to maximize Q instead 
assigns the weight 

6 ■ {w{x) + w{y)) 

to the root node of the subtree of merged items. With this 
modified combining rule, the algorithm proceeds in a similar 
manner as Huffman coding, yielding a code with optimal 
probability of success. 

III. Related Problems 

Note that if we use this probability of success as a tie- 
breaker among codes with minimal expected length — those 
optimal under the traditional measure of coding — the solution 
is unique and independent of the value of 6, a straightforward 



consequence of [8]. We can obtain this optimal code by using 
the top-merge variation of Huffman coding given in [9]; this 
variation views combined items as "smaller" than individual 
items of the same weight. Similarly, for 6 sufficiently near 
1 — i.e., if the amount of information to be communicated 
is large compared to the size of the window in question — 
the optimal solution is identical to this top-merge solution, a 
straightforward result analogous to that noted in [10, p. 222]. 
Thus traditional Huffman coding should be used if the window 
size is expected to be far larger than the message size. 

Observe also that, if we change the probability of P(r = 0) 
without changing the ratios between the other probabilities, 
the problem's solution code does not change, even though 
the probability of success does. There are still more criteria 
that are identically optimized, including, if we have several 
independent messages serially transmitted, maximizing the 
number of messages expected to be sent within a window. 
Another problem arises if we have a series of windows with 
independent instances of the problem and want to minimize 
the expected numbers of windows needed for success. The 
maximization of probability minimizes this number, which is 
the inverse of the probability of success in each window: 

Note that this is a risk-loving objective, in that we are more 
willing than in standard coding to trade off having longer 
codewords for unlikely items for having shorter codewords 
for likely items. 

However, if the message to send is constant across all 
windows rather than independent, the expected number of 
windows needed — assuming it is necessary to restart com- 
munication for each window — is instead 



This is a risk-averse objective, in that we are less willing to 
make the aforementioned tradeoff than in standard coding. 
These distinct objective functions can be combined into one if 
we normalize, that is, if we seek to minimize penalty function 



Le(p,0 = logeVp«e' 



/ ^ 1 

i=l 



(2) 



for 6* > 0, where minimizing expected length is the limit 
case of 6* ^ 1. Campbell first noticed this in [11]. Others 
later found that the aforementioned generalized Huffman-like 
algorithm optimizes this for all 6* > 0, though previously only 
>\ had any known application. 

IV. Alphabetic Codes 

Under siege, assuming the absence of a predetermined 
code, using the optimal Huffman-like code would likely be 
impractical, since one would need to account not only for the 
time taken to answer a question, but the time needed to ask 
it. In this, and in applications such as search trees and testing 



for faulty devices in a sequential input-output system [12] — 
assuming the answer remains binary — each question should 
be of the form, "Is the output greater than x7' where x is 
one of the possible symbols, a symbol we call the splitting 
point for the corresponding node. This restriction is equivalent 
to the constraint that c(j) -< c{k) whenever j < k, where 
codewords c( ) are compared using lexicographical order. The 
dynamic programming algorithm of Gilbert and Moore [13] 
can be adapted to this restricted problem. 

The key to the modified algorithm is to note that any optimal 
coding tree must have all its subtrees optimal. Since there are 
n~l possible splitting points, if we know all potential optimal 
subtrees for all possible ranges, the splitting point can be 
found through sequential search of the possible combinations. 
The optimal tree is thus found inductively, and this algorithm 
has 0{n^) time complexity and 0{n^) space complexity. 
The dynamic programming algorithm involves finding the 
maximum tree weight Wj^k (and corresponding optimum tree) 
for items j through k for each value of k— j from to n — 1, 
computing inductively, starting with Wjj = w{j) (= p{j)), 
with 

for j < k. Knuth showed how the traditional linear version 
of this approach can be extended to general search trees [14]; 
for the siege scenario, this is a straightforward generalization, 
which we omit here for brevity. For answers having unequal 
cost, algorithms analogous to the linear-objective ones given 
in [15], [16] are similarly formulated. 

Another contribution of Knuth in [14] was to reduce algo- 
rithmic complexity for the linear version using the fact that 
the splitting point of an optimal tree must be between the 
splitting points of the two (possible) optimal subtrees of size 
n' — 1. With the siege problem, this property no longer holds; 
a counterexample to this is 9 = 0.6 with weights (8, 1, 9, 6). 

Similarly, for the linear problem [17], as well as for 9 > 
1 and some nonexponential problems [4], there is a well- 
known procedure — the Hu-Tucker algorithm — for finding 
an optimal alphabetic solution in O(nlogn) time and linear 
space. The corresponding algorithm for 9 < 1 fails, however, 
this time for 6 = 0.6 and weights (8, 1,9, 6, 2). Approximation 
algorithms presented in Section I VII though, have similar or 
lesser complexity. 

V. Bounds ON Optimal Codes 

Returning to the general (nonalphabetic) case, it is often 
useful to come up with bounds on the performance of the 
optimal code. In this section, we assume without loss of 
generality that p{l) > p{2) > ■ ■ ■ > p{n). Note that 
9 < 0.5 is a trivial case, always solved by a unary code, 
Cu ^ (0, 10, 110,..., 11- • -10, 11- ••11). For nontrivial 9 > 
0.5, there is a relationship between the problem and Renyi 
entropy. 

Campbell first proposed a decaying exponential utility func- 
tion for coding in [18]. He observed a simple upper bound for 
Q with 9 > 0.5 in [18] and alluded to a lower bound in [19]. 



These bounds are similar to the well-known Shannon entropy 
bounds for Huffman coding (e.g., [20, pp. 87-88], [21]). In this 
case, however, the bounds involve Renyi's a-entropy [22], not 
Shannon's. Renyi entropy is 

n 

Hc,{p)^- — log, J2p{ir 

1 — a ^ — ' 

i—l 

where, in this case. 



log22^? l + \og,9- 
For nontrivial maximizations (9 E (0.5, 1)), 

0H^(p)+i ^ jnaxP[/(X) < T] < 61^° (P). (3) 

We can rephrase this using the definition of Le{p, in (|2j as 

0<mmLg{pJ)~H^{p)<l, (4) 

a similar result to the traditional coding bound [21]. Inequal- 
ity also holds for the minimization problem of 9 > 1. 

As an example of these bounds, consider the probability 
distribution implied by Benford's law [23], [24]: 

p{i) = loglo(^ + 1) - logio(0, i = 1, 2, ... 9 (5) 

At 9 = 0.9, for example, Ha{p) « 2.822, so the optimal 
code will have between a 0.668 and 0.743 chance of suc- 
cess. Running the algorithm, the optimal lengths aie I — 
(2, 2, 3, 3, 4, 4, 4, 5, 5), resulting in a probability of success of 
0.739. 

More sophisticated bounds on the optimal solution for the 
9 > 1 case were given in [25]; these appear as solutions 
to related problems rather than in closed form. Closed-form 
bounds given in [26] are functions of entropy (of degree a) and 
p{l), as in the linear case [27]-[32]. These bounds are flawed, 
however, in that they assume p(l) > 0.4 always implies an 
optimal code exists with 1(1) — 1. A simple counterexample 
to this assumption is p = (0.55,0.15,0.15,0.15) with 9 = 2, 
where = 2 for all i. 

However, when 9 < 1, because the multiplication step of 
the generalized Huffman-like coding algorithm provides for a 
strict reduction in weight, /(I) = 1 for any p{l) > 0.4. Here 
we present better conditions on = 1 and show that they 
are tight, then derive better entropy bounds from them. 

Theorem 1: Ifp(l) > 29{29+'i)^^, then there is an optimal 
code for p with /(I) = 1. 

This is a generalization of [28] and is only slightly more 
complex to prove: 

Proof: Recall that the generalized Huffman algorithm 
combines the items with the smallest weights, w' and w" , 
yielding a new item of weight w = 9{w' + w"), and this 
process is repeated on the new set of weights, the tree thus 
constructed up from the leaves to the root. Consider the 
step at which item 1 gets combined with other items; we 
wish to prove that this is the last step. At the beginning 
of this step the (possibly merged) items left to combine 



are {1}, S2, S^, . . . , S^, where we use Sj to denote both a 
(possibly merged) item of weight wiSj) and the set of (single) 
items combined to make item S*. Since {1} is combined in 
this step, all but one Sj has at least weight Recall too 
that all weights w{Sj) must be less than or equal to the sums 
of probabilities X^ies*! Then 



26l(fc-l) 
26>+3 



< P(l)+E-=2^(^') 

which, since 6 > 0.5, means that fc < 5. Thus, because n < 
4 is a trivial case, we can consider the steps in generalized 
Huffman coding at and after which four items remain, one of 
which is item {1} and the others of which are S2, S^, and Sf. 
We show that, if p(l) > 2/(26* + 3), these items are combined 
as shown in Fig. ^ 




w{St U Si) = 
9lw{St) + wist)] 



Fig. 1 . Tree in last steps of tlie generalized Huffman algorithm 

We assume without loss of generality that weights w(S'|), 
w{S^), and w{S^) are in descending order. From w(S'|) + 
w{Sl) + wist) < EL2P(*) < 3/(2^ + 3), w{Sl) > w{Sl), 
and w(S'l) > w{Si), it follows that w{Sl)+w{Si) < 2/(29+ 
3). Consider set 5|. If its cardinality is 1, then u'(S'|) < p{l), 
so the next step merges the least two weighted items 5*1 and 
54. Since the merged item has weight at most 26/ (29+3), this 
item can then be combined with S*!, then {!}, so that /(I) = 1. 
If S2 is a merged item, let us call the two items (sets) that 
merged to form it S2 and 5*2 , indicated by the dashed nodes 
in Fig. [2 Because these were combined prior to this step, 
w{S'2) + wiS'r^) < w{Sl) + w{St), so w{Sl) < 9[w{Sl) + 
wlsj)] < 29/(29 + 3). Thus and by extension 

and w(St), are at most p{l). So 5*1 and 5*1 can be combined 
and this merged item can be combined with 5*1, then {1}, 
again resulting in Z(l) = 1. ■ 

This can be shown to be tight by noting that 

29 



29 + 3 



3e, 



1 

26* + 3 



1 

29 + 3 



1 

261 + 3 



has optimal length vector I = (2, 2, 2, 2) for any e e (0, (26* - 
l)(80+12)-i). 

Upper bounds derived from this, although rather compU- 
cated, are improved. 

Corollary 1: For ^(l) = 1 (and thus for all p(l) > 261(261 + 
3)^^) and 6* < 1, the following holds: 



E 

i=l 



p{i)9' 



> 



9pil) 



This is a straightforward consequence of Theorem ^ and a 
proof is thus omitted for space. This upper bound is tight for 
p{l) > 0.5, as p — (p(l), 1 — p(l) + e, e) gets arbitrarily close 
for small e. 

Let us apply this result to the Benford distribution in Q 
for 9 = 0.6. In this case, Ha{p) w 2.260 and p(l) > 
29(29 + 3)^^, so 1(1) ~ 1 and the probability of success 
is between 0.251 and 0.315 = 6*^°^^'; the simpler (inferior) 
lower probability bound in Q is 0.189. The optimal code 
is Z = (1,2,3,4,5,6,7,8,8), which yields a probability of 
success of 0.296. 

VI. Approximation Algorithms and Bounds for 
Alphabetic Codes 

Returning again to alphabetic codes, if the dynamic pro- 
gramming solution is too time- or space-consuming, an ap- 
proximation algorithm can be used. A simple approximation 
algorithm involves adding one to each of the lengths of an 
optimal nonalphabetic code; this yields lengths corresponding 
to an alphabetic code, since 2-'(') < 0.5 is sufficient to 
have an alphabetic code [33, p. 34], [12, p. 565]. Putting the 
lengths into (|2j, 

where L^"^{p) is the cost of the optimal code for the non- 
alphabetic problem. Limits in terms of Renyi entropy follow 
from the previous section, and the following improved approx- 
imation algorithm means that the right inequality is strict. 

Approximation can be improved by utilizing techniques in 
[12] and [34]. The improved algorithm has two versions, one 
of which is linear time, using the Shannon-like 



H^) ^ 



-alog2p{i) + log2 



and one of which is 0{n\ogn) (or linear if sorting weights 
can be done in linear time). 

Procedure for Finding a Near-Optimal Code 

1) Start with an optimal or near-optimal nonalphabetic 
code, Z"™, such as the Shannon-like = Z§ or the 
Huffman-like = Z*"""^. 

2) Find the set of all minimal points, A4. A minimal point 
is any i such that 1 < i < n, l{i) < l{i — 1), and 
l{i) < l{i + 1). Additionally, if l{i - 1) > l{i) = l{i + 
1) = ■ ■ ■ = l{i + k) < l{i + k + 1), then, of these, only 



j e [i, j + k] minimizing w{j) (or p{j)) is a minimal 
point. 

3) Assign a preliminary alphabetic code with lengths /p"^ — 

for all minimal points, and /p'^'= ^ for all other 
items. This corresponds to an alphabetic code C^"^*^. Note 
that such an alphabetic code is easy to construct; the first 
codeword is 1{1) zeros, and each additional codeword 
c{i) is obtained by either truncating c{i — l) to digits 
and adding 1 to the binary representation (if < — 
1)) or by adding 1 to the binary representation of c(i — 1) 
and appending — l{i — 1) zeros (if > l{i — 1)). 

4) Go through the code tree (with, e.g., a depth-first search), 
and replace any node having only one child with its 
grandchild or grandchildren. At the end of this process, 
an alphabetic code with X^ILi 2"'^*-* = 1 is obtained. 

This hybrid of the approaches of Nakatsu [34] and Yeung 
[12] can be easily applied to all 9 > 0, including the 
linear limit case, for which it is an improved approximation 
technique when 1"°" ^ l^""^. 

VII. Related Work, Extensions, and Conclusion 

The algorithms presented here will not work if n = oo, 
although methods are known of finding codes for geometric 
and lighter distributions [35] and existence results are known 
for all finite-(Renyi) entropy distributions [36]. Also, although 
presented here in binary form for simplicity's sake, nonalpha- 
betic results readily extend to D-ary codes [7], [18], [19]. 
The alphabetic algorithm extends in a manner akin to that 
shown for the extension of the Gilbert and Moore algorithm 
in [15, pp. 15-16]. Further upper bounds on optimal Lg{p,l) 
are elusive, but should be quite similar to those for the linear 
case, at least for 9 < 1, since the distributions approaching 
or achieving these bounds should be of bounded cardinality 
almost everywhere. 

In conclusion, when Renyi's siege scenario is formalized, 
problem solutions involve Huffman coding, dynamic program- 
ming, and, appropriately, Renyi entropy. 
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